Simultaneous multithreaded vector architecture: merging ILP and DLP for high performance

نویسندگان

  • Roger Espasa
  • Mateo Valero
چکیده

The goal of this paper is to show that instruction level parallelism (ILP) and data-level parallelism (DLP) can be merged in a single simultaneous vector multithreaded architecture to execute regular vec-torizable code at a performance level that can not be achieved using either paradigm on its own. We will show that the combination of the two techniques yields very high performance at a low cost and a low complexity: We will show that this architecture achieves a sustained performance on numerical regular codes that is 20 times the performance that can be achieved with today's superscalar microprocessors. Moreover, we will show that the architecture can tolerate very large memory latencies, of up to a 100 cycles, with a relatively small performance degradation. This high performance is independent of working set size or of locality considerations, since the DLP paradigm allows very eecient exploitation of a high performance at memory bandwidth.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Exploiting instruction- and data-level parallelism

istorically, computer architects have taken two different approaches to high-performance computing: instruction level parallelism and data-level par-allelism. The ILP paradigm seeks to execute several instructions each cycle. It does this by exploring a sequential instruction stream and extracting independent instructions to send to several execution units in parallel. The DLP paradigm, on the ...

متن کامل

Performance Advantages of Merging Instruction - and Data - Level Parallelism

This report presents a new architecture based on addding a vector pipeline to a superscalar microprocessor. The goal of this paper is to show that instruction-level parallelism (ILP) and data-level parallelism (DLP) can be merged in a single architecture to execute regular vectorizable code at a performance level that can not be achieved using only ILP techniques. We present an analysis of the ...

متن کامل

A case for merging the ILP and DLP paradigms

The goal of this paper is to show that instruction level parallelism (ILP) and data-level parallelism (DLP) can be merged in a stngle architecture to ezecute vectorizable code at a performance level that can not be achieved using either paradigm on its own. We will show that the combination of the two techniques yields very high performance at a low cost and a low complexity. We will show that ...

متن کامل

A Vector-µSIMD-VLIW Architecture for Multimedia Applications

Media processing has motivated strong changes in the focus and design of processors. These applications are composed of heterogeneous regions of code, some of them with high levels of DLP and other ones with only modest amounts of ILP. A common approach to deal with these applications are μSIMD-VLIW processors. However, the ILP regions fail to scale when we increase the width of the machine, wh...

متن کامل

Simultaneous Multithreading

Current research in processor technology and computer architecture is motivated primarily by the need for greater performance. In this context, it is well understood that the performance gain from improving the memory system alone is limited, and using system Level Integration (such as supporting graphics/sound on chip) can only lead to marginal performance benefits. The most significant gain c...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1997